1,845 research outputs found

    Termites in the woodwork

    Full text link
    Termites eat and digest wood, but how do they do it? Combining advanced genomics and proteomics techniques, researchers have now shown that microbes found in the termites' hindguts possess just the right tools. Most animals, from insects to mammals, carry complex communities of microbes in their digestive tracts. In the case of wood-eating termites, these gut microbes are particularly important: they are thought to provide most of the capabilities needed for efficient digestion of wood, which is otherwise a largely inaccessible food source. They also help to compensate for the paucity of some nutrients in wood, for example by fixing atmospheric nitrogen, and they synthesize essential amino acids and other compounds for their hosts [1, 2]. Despite their importance, relatively little is known about gut microbes in termites. This is partly because gut microbes are often difficult to grow in pure culture (as is the case for most microbes sampled from natural environments). Furthermore, a single termite can harbor a very complex assemblage of hundreds of different microbial lineages, whose members may vary widely in terms of abundance and growth rates. Without access to cultivated strains, researchers have to rely on so-called 'cultivation-independent' molecular techniques to analyze such communities. A clever combination of these techniques has now been applied to a section of the termite hindgut, aiming to identify molecular tools used by the microbes in this compartment to degrade wood [3]. Here, we review the procedures and results of this study, and discuss insights into the biological system as well as implications for the generation of biofuels

    Termites in the woodwork

    Get PDF
    Metagenomic analysis of termite gut flora reveals a diversity of wood-degrading enzymes

    HPC-CLUST: distributed hierarchical clustering for large sets of nucleotide sequences

    Get PDF
    Motivation: Nucleotide sequence data are being produced at an ever increasing rate. Clustering such sequences by similarity is often an essential first step in their analysis—intended to reduce redundancy, define gene families or suggest taxonomic units. Exact clustering algorithms, such as hierarchical clustering, scale relatively poorly in terms of run time and memory usage, yet they are desirable because heuristic shortcuts taken during clustering might have unintended consequences in later analysis steps. Results: Here we present HPC-CLUST, a highly optimized software pipeline that can cluster large numbers of pre-aligned DNA sequences by running on distributed computing hardware. It allocates both memory and computing resources efficiently, and can process more than a million sequences in a few hours on a small cluster. Availability and implementation: Source code and binaries are freely available at http://meringlab.org/software/hpc-clust/; the pipeline is implemented in C++ and uses the Message Passing Interface (MPI) standard for distributed computing. Contact: [email protected] Supplementary Information: Supplementary data are available at Bioinformatics onlin

    Functional clues for hypothetical proteins based on genomic context analysis in prokaryotes

    Get PDF
    Three integrated genomic context methods were used to annotate uncharacterized proteins in 102 bacterial genomes. Of 7853 orthologous groups with unknown function containing 45,110 proteins, 1738 groups could be linked to functionally associated partners. In many cases, those partners are uncharacterized themselves (hinting at newly identified modules) or have been described in general terms only. However, we were able to assign pathways, cellular processes or physical complexes for 273 groups (encompassing 3624 previously functionally uncharacterized proteins)

    STRING and STITCH: known and predicted interactions between proteins and chemicals

    Get PDF
    Information on protein-protein and protein-chemical interactions is essential for understanding cellular functions. The STRING and STITCH web resources integrate interaction evidence derived from pathways, automatic literature mining, primary experimental data, and genomic context. The resulting interaction networks cover 1.5 million proteins from 373 organisms and 68,000 chemicals

    Tree reconciliation combined with subsampling improves large scale inference of orthologous group hierarchies

    Full text link
    Background: An orthologous group (OG) comprises a set of orthologous and paralogous genes that share a last common ancestor (LCA). OGs are defined with respect to a chosen taxonomic level, which delimits the position of the LCA in time to a specified speciation event. A hierarchy of OGs expands on this notion, connecting more general OGs, distant in time, to more recent, fine-grained OGs, thereby spanning multiple levels of the tree of life. Large scale inference of OG hierarchies with independently computed taxonomic levels can suffer from inconsistencies between successive levels, such as the position in time of a duplication event. This can be due to confounding genetic signal or algorithmic limitations. Importantly, inconsistencies limit the potential use of OGs for functional annotation and third-party applications. Results: Here we present a new methodology to ensure hierarchical consistency of OGs across taxonomic levels. To resolve an inconsistency, we subsample the protein space of the OG members and perform gene tree-species tree reconciliation for each sampling. Differently from previous approaches, by subsampling the protein space, we avoid the notoriously difficult task of accurately building and reconciling very large phylogenies. We implement the method into a high-throughput pipeline and apply it to the eggNOG database. We use independent protein domain definitions to validate its performance. Conclusion: The presented consistency pipeline shows that, contrary to previous limitations, tree reconciliation can be a useful instrument for the construction of OG hierarchies. The key lies in the combination of sampling smaller trees and aggregating their reconciliations for robustness. Results show comparable or greater performance to previous pipelines. The code is available on Github at: https://github.com/meringlab/og_consistency_pipeline

    Inferring Correlation Networks from Genomic Survey Data

    Get PDF
    High-throughput sequencing based techniques, such as 16S rRNA gene profiling, have the potential to elucidate the complex inner workings of natural microbial communities - be they from the world's oceans or the human gut. A key step in exploring such data is the identification of dependencies between members of these communities, which is commonly achieved by correlation analysis. However, it has been known since the days of Karl Pearson that the analysis of the type of data generated by such techniques (referred to as compositional data) can produce unreliable results since the observed data take the form of relative fractions of genes or species, rather than their absolute abundances. Using simulated and real data from the Human Microbiome Project, we show that such compositional effects can be widespread and severe: in some real data sets many of the correlations among taxa can be artifactual, and true correlations may even appear with opposite sign. Additionally, we show that community diversity is the key factor that modulates the acuteness of such compositional effects, and develop a new approach, called SparCC (available at https://bitbucket.org/yonatanf/sparcc), which is capable of estimating correlation values from compositional data. To illustrate a potential application of SparCC, we infer a rich ecological network connecting hundreds of interacting species across 18 sites on the human body. Using the SparCC network as a reference, we estimated that the standard approach yields 3 spurious species-species interactions for each true interaction and misses 60% of the true interactions in the human microbiome data, and, as predicted, most of the erroneous links are found in the samples with the lowest diversity.United States. Dept. of Energy (Contract DE-AC02-05CH11231

    Pathogenic impact of transcript isoform switching in 1,209 cancer samples covering 27 cancer types using an isoform-specific interaction network

    Full text link
    Under normal conditions, cells of almost all tissue types express the same predominant canonical transcript isoform at each gene locus. In cancer, however, splicing regulation is often disturbed, leading to cancer-specific switches in the most dominant transcripts (MDT). To address the pathogenic impact of these switches, we have analyzed isoform-specific protein-protein interaction disruptions in 1,209 cancer samples covering 27 different cancer types from the Pan-Cancer Analysis of Whole Genomes (PCAWG) project of the International Cancer Genomics Consortium (ICGC). Our study revealed large variations in the number of cancer-specific MDT (cMDT) with the highest frequency in cancers of female reproductive organs. Interestingly, in contrast to the mutational load, cancers arising from the same primary tissue had a similar number of cMDT. Some cMDT were found in 100% of all samples in a cancer type, making them candidates for diagnostic biomarkers. cMDT tend to be located at densely populated network regions where they disrupted protein interactions in the proximity of pathogenic cancer genes. A gene ontology enrichment analysis showed that these disruptions occurred mostly in protein translation and RNA splicing pathways. Interestingly, samples with mutations in the spliceosomal complex tend to have higher number of cMDT, while other transcript expressions correlated with mutations in non-coding splice-site and promoter regions of their genes. This work demonstrates for the first time the large extent of cancer-specific alterations in alternative splicing for 27 different cancer types. It highlights distinct and common patterns of cMDT and suggests novel pathogenic transcripts and markers that induce large network disruptions in cancers
    • 

    corecore